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The Arnold diffusion constitutes a dynamical phenomenon which may occur in the phase space of a non- 
integrable Hamiltonian system whenever the number of the system degrees of freedom is M > 3. The diffusion 
is mediated by a web-like structure of resonance channels, which penetrates the phase space and allows the 
system to explore the whole energy shell. The Arnold diffusion is a slow process; consequently the mapping of 
the web presents a very time-consuming task. We demonstrate that the exploration of the Arnold web by use of 
a graphic processing unit (GPU)-supercomputer can result in distinct speedups of two orders of magnitude as 
compared to standard CPU-based simulations. 
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Several archetype results of dynamical chaos theory 
can unambiguously be attributed to unforeseen outcomes 
of numerical experiments. Two such examples are the ab- 
sence of thermalization in the Fermi-Pasta-Ulam (FPU) 
chaini and the discovery of the Lorenz attractor^. In some 
cases the use of numerical simulations presents the only 
method to obtain insight into the behavior and evolution 
of the system of interest. There are several branches of 
modern physics, which are computational by their own 
nature, with Computational Cosmology- being such a 
paradigm. Nowadays, computational cosmologists per- 
form simulations with more than 10^° particles^. The 
main driving force underpinning this advance is the high 
parallelization of simulations that allows one to run an 
artificial Universe on thousands of processors simultane- 
ously. On the other hand, one may benefit as well from 
a high parallelization on the scale of much smaller sys- 
tems. For example, the averaging over many realiza- 
tions of a stochastic force or of quenched disorder to ar- 
rive at scaling exponents, the Monte-Carlo sampUng etc., 
present typical routines in many areas of computational 
physics. It is evident that by running A^ different real- 
izations on N processors in parallel, one can speed up 
the statistical collection process by a factor of N. One 
of the possibiUties is then to use a computational clus- 
ter by sending the program to run on many CPUs simul- 
taneously, collecting the output data and finally sample 
them. This "sending-collecting-sampUng" process is cum- 
bersome work, which, however, could be avoided by us- 
ing special designed scripts. The advent of Graphics Pro- 
cessing Units (GPUs) has brought such numerical experi- 
ments into a new level of performance^. Originally used as 
hardware chips, designed as graphic data-pipelines, GPUs 
were soon recognized as an additional beneficial tool: re- 
searchers from such areas as medical imaging, compu- 
tational electromagnetics, and hydrodynamics, have suc- 
cessfully implemented them for data processing*. Nowa- 
days computational physics is marked by an impres- 
sive boost of the "General Purpose Computing on GPU" 
(GPC-GPU) ideology^. With this work we attempt to 
demonstrate how the use of a GPU-supercomputer can 
provide useful insight for complex problems of nonlinear 



dynamics such as Arnold diffusion^. 



I. INTRODUCTION 

The Arnold diffusion is not that kind of phenomenon that 
involves the time evolution of billions of particles. In fact, 
it can appear in the phase space dynamics of a Hamiltonian 
system whenever the number of system degrees of freedom 
is M > 3. The Arnold diffusion is relevant in celestial me- 
chanics and astronomy^, plasma dynamics in stellarators and 
tokamaksiSiil. It also influences the evolution of a Rydberg 
atom placed in crossed magnetic and electric fields^, or it 
might explain experimentally observed effects of emittance 
growth in the TeVatron collidersi^. Moreover, Arnold diffu- 
sion may be a mechanism of the ultraviolet cut-off in statis- 
tical mechanics'**. Except some specially designed models, 
where the presence of the diffusion can be rigorously proved 
and the diffusion timescales can theoretically be estimated, 
only little work is generally analytically possible. Therefore, 
numerical experiments play an prominent role in the studies 
of Arnold diffusioni^— . However, conclusive numerical out- 
put then requires extremely large time scales; - because the 
actual rate of the process is not known ab initio. With this 
study we demonstrate that the presence of the Arnold diffu- 
sion in the dynamics of a model Hamiltonian system can be 
visualized by scanning the system phase space with a giant en- 
semble of trajectories. This has been realized within the Com- 
pute Unified Device Architecture (CUDA) frameworbi^ with 
its benchmarks performed on a NVIDIA Tesla GPU. We de- 
tect the Arnold web^i, i.e. a chaotic network which can carry 
the system over the energy shell, in the limit of extremely 
weak perturbations. We also resolve a rich fractal structure of 
the Arnold web in the regime of moderate non-integrability, 
when the clusters of high-order resonances start to contribute 
to diffusion. 



II. MODEL FOR ARNOLD DIFFUSION 

One of the typical models to study the Arnold diffusion is 
a system of coupled standard maps, see, for example, Refs3. 



Such systems are easy to handle because they only need to be 
iterated and do not demand sophisticated integration schemes. 
However, they represent a class of driven Hamiltonian sys- 
tems, and their exposition to a train of delta-kicks leads to an 
energy pumping and an unlimited diffusion in the momentum 
subspace. Therefore, time evolution of such systems is not 
restricted to a compact manifold in the corresponding phase 
space. 

Here we want to address the case of autonomous Hamilto- 
nian systems. Hamiltonians of such systems represent inter- 
gals of motion, and their time evolution is restricted to mani- 
folds of constant energy. 

An autonomous Hamiltonian system with one degree of 
freedom, M = 1, is always integrable since it possesses 
the integral of motion, E = H{x,p). Hamiltonian systems 
with two degrees of freedom, AI = 2, evolve in a four- 
dimensional phase space, fi = {(a;, y,Pa;,Py)}, but the sys- 
tem evolution is confined to the energy shell of energy E = 
-ff(a;(0),y(0),p2;(0),Pj,(0)), determined by the initial condi- 
tions. Regular two-dimensional invariant manifolds, torfi^ , 
separate the system energy shell into different regions. Dif- 
ferent regions exhibit different dynamics, chaotic or regular 
ones, but each region is perfectly isolated from the remaining 
ones. This is so because the two-dimensional tori provide a 
complete partition of the three-dimensional energy shell. This 
topological argument does not work anymore in higher dimen- 
sions: the A/-dimensional tori cannot partition the (2 A/ — 1)- 
dimensional shell whenever the number of the degrees of free- 
dom M > 3. Therefore, there could be trajectories that slip 
between regular tori and thus are able in exploring the whole 
energy shell. 

To be more specific, we consider a three-dimensional sys- 
tem with a Hamiltonian: 



if (P, X) = — + £Hp(X) = iio(P) + eiip(X), (1) 

where P = {Px,Py,Pz) and X = {x,y,z) denote the mo- 
mentum and coordinate vectors. For e = the system is 
completely integrable, and the vector P is constant along any 
trajectory of the system, X(t) ~ X(0) + Pt. For a given en- 
ergy E the energy shell forms a sphere in the momentum sub- 
space, S : P-^ = 2E, and system trajectories are represented 
by fixed points on the sphere. Consider now the regime of 
weak perturbation, e ^ 1. This implies that the system phase 
space O remains almost completely filled with invariant tori, 
which form the set Jltori. The motion is regular on the man- 
ifold rJtoii but there also appears a tiny manifold, the relative 
complement of Jltori in ft, which constitutes the Arnold web, 
Hweb = J^\J^toii- The Arnold web covers the resonance lines 
^m := {P|m-P = 0}, where m = {mx,rny,mz) is a triplet 
of coprimes, by narrow chaotic channels. A typical channel 
width depends on the order of the corresponding resonance, 
m = max |toq,|, and it usually decreases with the increase of 
m. The total volume occupied by the Arnold web is expected 
to scale as -y/s^. 

The appearance of the Arnold diffusion may take a while. 
The motion along the web can be detected only when the 
change AP(t) — ||P(t) — P(0)|| assumes a noticeable value. 



The Nekhoroshev theorem predicts that such a change can be 
observed after a time Ia, which scales at least exponentially 
with l/eS. Moreover, only a tiny fraction of the initial con- 
ditions, which entered into the manifold Hweb, can diffuse. 
Therefore, even the numerical detection of the Arnold diffu- 
sion is a rather difficult and time-consuming tas k'^'^^ . 

Here, we consider a perturbation Hamiltonian of the ex- 
plicit form: 



Hp[x, y, z) = cos(a;) cos(y)[l + cos(2z)]. 



(2) 



The total Hamiltonian ii(P,X) in eqs. ([T] [2]) cannot be 
separated into several independent two- or one-dimensional 
Hamiltonians; therefore the system is manifestly three- 
dimensional. The system in eqs. (1-2) falls within the 
range of Nekhoroshev theorem since the the Hamiltonian Hq 
is convex and the function Hp{X.) is analytic. Moreover, the 
perturbation potential (|2]i is bounded, \Hp{X.)\ < 2, and the 
total system energy, E = i7(P,X), effectively controls the 
strength of perturbation. Therefore we set e = 1 and intro- 
duce an effective perturbation parameter 



£eff = max \Hp{X)\/H(P, X) = 2/E 



(3) 



In the momentum subspace the system evolves within the 
sphere S. Namely, the system dynamics is confined to a 
thin layer enclosing the surface of the sphere S, Le(P)'- 
2{E - Eeff) < P^ < 2(£' + Eeff). Resonance lines m • P = 
form a set of circles on the surface of the sphere. In the limit 
£eff ^ 1, the Arnold web represents a sparse net on S. Upon 
increasing the perturbation strength, egff, the Arnold web tends 
to become 'thicker' : the invariant tori, which were located far 
outside of the low-order resonances (the distance is defined by 
the Diophantine resonance conditions^'), and were unaffected 
by the weak perturbation, become now modified. The increas- 
ing perturbation involves more and more resonances into the 
growing Arnold web. This all results in the appearance of a 
complex, fine-structured network braiding whole areas on the 
surface of the sphere. 



III. COMPUTATIONAL METHOD 

Numerical studies of Arnold diffusion demand long runs. 
Therefore, one should resort to the integrators which con- 
sistently respect the symplectic properties of the original 
systems^. A symplectic numerical scheme replaces the orig- 
inal, continuous-time Hamiltonian by its discretized version, 
and the numerical propagation produces the exact time evo- 
lution of the corresponding Hamiltonian map2i. The so dis- 
cretized system dynamics thus is a slightly perturbed version 
of the original one, so that there is no secular growth of the 
system integrals of motions, which is the total system energy 
E in our case. The smaller the time step of the corresponding 
Hamiltonian map is, the closer both systems, original and dis- 
cretized version stay close to each other. For the propagation 
of the Hamiltonian in eqs. (1 - 2) we used the sixth-order sym- 
plectic integrator from Ref.^. For the time step h — 10 ^^T, 
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FIG. 1: Performance of a central processing unit (CPU) of the 
Augsburg University computational cluster (Intel Xeon Processor 
2.93GHz Quad-Core X5570 Gainestown) versus the GPU (NVIDIA 
TESLA M2050) for the propagation of an ensemble of Hamiltonian 
systems (T] [2li: the overall calculation time is depicted as a func- 
tion of the number of realizations N (see text for more details). The 
propagation time of a single realization was t = 5000T. 



where the timescale is determined by the characteristic fre- 
quency of the particle oscillations at the bottom of the poten- 
tial well, T = 27r, the absolute error in the system energy, 
AE{t) = \E{t) - E{0)\, did remain below 10"* during the 
whole integration time. 

All our calculations have been performed on a NVIDIA 
TESLA M2050 GPU, with 448 processing units on board, and 
by using CUDA and double-precision accuracy. We employed 
the complete parallelization, which allowed us in total to run 
TV = 10^ realizations simultaneously. The performance gain 
is shown in Figure 1. While the calculation time on a sin- 
gle stack-structured CPU grows linearly with N. At the same 
time, up to A^ = 7168 realizations can be calculated on a GPU 
simultaneously, due to the distribution procedure performed 
by CUDA. This number is a multiple of the number of GPU- 
cores, 7168 = 16 x 448. A further increase of the number 
of realizations causes a re-distribution of threads between the 
fixed number of cores and slow down the calculations speed, 
note the steps on the corresponding curve in Figure 1 . 




FIG. 2: (a) Arnold web. Probability density, FAt(V; 0) (see text), 
of the average velocity over At = 50T of the system in eqs. (1- 
2) starting from uniformly distributed initial conditions with a total 
energy E = 400. The braided stripes correspond to the resonances 
m ■ P = 0. The width of a stripe depends on the order of the res- 
onance: the higher the order, the narrower is the stripe. Insets (b) 
and (c): These depict zooms of the corresponding resonance inter- 
sections. The inner stripe regions correspond to chaotic channels of 
the Arnold web, see the main text for more details. 



IV. MAIN RESULTS OF THE SIMULATIONS 

In order to visualize the Arnold web we employed the fol- 
lowing procedure. After having the initial conditions dis- 
tributed over a certain region of the system phase space the 
ensemble of trajectories was launched. The location and the 
shape of the distribution are tunable, so that we can scan dif- 
ferent regions of the phase space at will. For every trajec- 
tory from the ensemble, {X'(i)}, i = 1,..,A^, we calcu- 
late the vector of the averaged finite-time velocity, V (t) = 
{V;^{t),Vj^{t),V^{t)), with the components reading: V^{t) = 
[X^(i + At) — X^{t)]/At, wherein the averaging interval is 
given by Ai. In other words, it corresponds to a point on S?^. 
By running many such realizations in parallel, we collected 
the statistics, and finally projected a probability distribution 
function, FAt(V;t), on the surface of the sphere S. - The 
distribution is a nonstationary function, in the sense that its 



shape depends on t, so that the distribution profile reflects the 
dynamics of the ensemble. 

If one of the ensemble trajectories was launched from a 
region filled with regular trajectories, it remains at the close 
vicinity of the initial momentum vector There is no mixing in 
regular regions. Therefore the projection of the correspond- 
ing part of the initial probability density distribution onto the 
momentum sphere remains invariant. When one of the en- 
semble trajectories entered the thin chaotic layer along a reso- 
nance m, it stays within the channel for some time ("sticking 
event"), and during this time the system moves with the veloc- 
ity vector, P, which nearly exactly obeys the resonance con- 
dition, m • P w 0. All the realizations within the chaotic layer 
contribute to the probability distribution, FAt(V; t), concen- 
trated on the corresponding resonance. The resonance ap- 
pears as a bright line enclosed by an empty 'dark zone', and 



the width of the dark zone corresponds to the width of the 
web around the resonance. An increase of the averaging time 
will induce further localization of the distribution at the cor- 
responding resonance lines, but the widths of channels, i. e. 
dark zones on the momentum sphere, are fixed and do not de- 
pend on At. Therefore, the color representation of FAt(V; t) 
allows one to clearly identify the location of the Arnold chan- 
nels. 

At this point, it is worth to refer to the two-dimensional 
limit of the Hamiltonian ([T]-|2]i. The momentum sphere repre- 
sents a circle in this limit, and alternating chaotic and regular 
zones provide a full partition of the circle into a set of sec- 
tors. A trajectory, being placed initially into one of the zones, 
stays forever within the corresponding sector Each chaotic 
zone can be characterized by the corresponding asymptotic 
velocity, V = {V^,Vy). When the averaging time At ap- 
proaches infinity, the distribution function inside the corre- 
sponding chaotic sector shrinks to the point V. Therefore, in 
the asymptotic limit sectors, corresponding to chaotic zones 
look like dark regions with bright points at their centers. 

The averaging time interval At is tunable and there exists 
no a priori best choice for it. Namely, every resonance chan- 
nel m can be characterized by a probability distribution of 
the corresponding sticking times, "^mlistick)- The distribu- 
tion always possesses a finite first moment--, a mean stick- 
ing time <stick(m) = /q T-0i„(T)dr. If the averaging time 
interval is much larger than this mean sticking time, the cor- 
responding resonance channel cannot be resolved. If, on the 
contrary. At ^ <stick(ni), the oscillations of the momentum 
vector P(m) along the high-order resonance, with m ^ I, 
will smear the probability density over a broad region thus 
preventing a resolution of the fine structure of the web. In the 
following we used the value At = SOT. 

The high degree of parallelism capability of the GPU offers 
a massive speedup for the ensemble propagation, see in Fig- 
ure 1 . With our Figures 2 and 3 we show the results of our 
numerical experiment with N w 10^ independent realizations 
and the propagation time has been set at tp = At. The overall 
simulation time of each experiment assumed 45 minutes. For 
a very long time evolution, see in Figure 4, we propagated an 
ensemble of iV = 3.2 • 10** particles up to a time tp ~ lO^T. 
The simulation time in this case was 24 hours. The standard 
CPU-based run of a numerical experiment of the same scale 
would require (i) approximately one year of calculations on a 
standard desktop PC or (ii) about five months on a more ad- 
vanced CPU of the Augsburg University computational clus- 
ter 

The results corresponding to the weak perturbative regime 
at a total energy E ~ 400, corresponding to eeff = 0.005, are 
shown in Figure 2. We used an ensemble with the initial con- 
ditions uniformly distributed over the sphere S, and over the 
torus [0, 27r] x [0, 27r] x [0,27r] in the coordinate space Ox. 
The most of the sphere surface is occupied by regular trajecto- 
ries, therefore the initial uniform distribution remains invari- 
ant almost everywhere, representing the uniform dark orange 
area on Figure 2a. The brightest narrow stripes which braid 
the sphere correspond to the lower order, m < 2, resonances. 
We also scanned the energy shell by using an ensemble of 



trajectories with the initial conditions uniformly distributed 
within a small segment of the sphere (enclosed by the two 
squares in Fig. 2 (a), with the aim to resolve the presence of 
the Arnold web around some higher-order resonances. The 
results, depicted in Figs. 2 (b,c), show that the structures of 
the resonance intersections are topologically similar. 

The results for the regime of moderate perturbation with to- 
tal energy E = 15; i.e. Seff = 2/15 , are depicted with Fig. 3. 
Although the perturbation is much stronger now, it is not yet 
strong enough so as to destroy all the resonance tori and bring 
the system into the regime of the global Chirikov diffusion^^. 
There are several patches of the resonance web, which form 
leaf-like clusters on the surface of the sphere. The zooms into 
clusters reveal a fine fractal structure, see Fig. 3 (b,c). The 
clusters are formed by higher-order resonances, and the width 
of the corresponding channels scales with the increase of the 
resonance orders. Therefore, both the number of realizations. 




FIG. 3: Arnold web. Probability density, -FAt(V;0) (see text), of 
the average velocity over At = 50T of the system in eqs. (1-2) 
starting from uniformly distributed initial conditions with a total en- 
ergy E = 15. The phase space is pierced by the resonance channels 
of different orders, which foiTn fine structured patches on the velocity 
sphere, see the sequel of insets from (a) — >■ (b) — >■ (c). The bright ar- 
eas correspond to the zones of well-developed chaos. For the highest 
resolution in panel (c) we double the averaging interval At = lOOT. 
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FIG. 4: Arnold diffusion. The time evolution of tlie probability density, -fAt(V; i), depicts the spreading of the ensemble of Ai" = 3.2 ■ 10* 



trajectories localized initially at the intersection of three major resonances, W = fi(i._i_2)nf2(i 
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The initial cloud assumes 



a Gaussian profile in the velocity space, with the center at the point W and dispersions CTi,^ — avy = 0.01, and a uniform distribution over 
the torus [0, 2-k] x [0, 27r] x [0, 27r] in the coordinate space, fix. The corresponding initial velocity Vz was calculated from the fixed energy 
condition E — 15. To resolve the fine structure of the web with a relatively small ensemble we averaged the probability density F over the 
whole time span between consecutive plots. Altough the radius of all spheres in the momentum space is the same, V2E, we consequently 
increased their sizes in order to resolve the fine structure of the developing Arnold web. 



N{m), and the averaging time, At{m), needed to resolve the 
web structure formed around the resonances of order m, grow 
quickly with rn. In order to overcome this obstacle we scan 
the region of interest with the ensemble launched within the 
designated area, and then tracked only those trajectories which 
remained within the area during the whole observation time. 
The results are shown in Figures 3 (b,c). 

Finally, we computed the diffusive spreading of the ensem- 
ble initially injected at the point of intersection of three low- 
order resonances, w = 0(1 _i 2) n i^(i.-i,-2) n i^(i.-i,o) — 

{\/E, VE, 0), which is at the center of a well-developed 
chaotic region. The evolution dynamics proceeds in a step- 
wise manner. In a first stage we detect a fast spreading over the 
chaotic region, which encloses the injection point. Then the 
spreading slows down, and eventually enters a second stage 
of slow diffusion through the Arnold web. Both stages are 
illustrated on Figure 4^"^. The system phase space is not uni- 
form so that two dynamical stages correspond to two different 
regions of the energy shell. The region at the vicinity of the in- 



tersections of the primary resonances is strongly chaotic, and 
the dynamics there is governed by the fast Chirikov diffusion. 
The rest of the shell is well-structured by the KAM tori, and 
the evolution there is mediated by the Arnold web. Fractal 
clusters play the role of hubs between two regions. 



V. CONCLUSIONS 

With this numerical study we have shown how a GPU su- 
percomputer can be used to explore the Arnold diffusion in 
near-integrable Hamiltonian systems. The appearance of the 
Arnold diffusion demands the evaluation of huge statistical 
data sets. It may be considered as a typical problem of sam- 
pling of rare eventsi^. Namely, once the system got into a 
narrow resonance channel, it stayed there for a while before 
making a transition to another channel. These transitions con- 
stitute rare events, which in fact determine the long-time evo- 
lution of the system. The disparity of the involved time scales. 



which strongly depend on the resonance orders, makes the 
mapping of the diffusion web a very challenging computa- 
tional task. The use of the CUDA-based NVIDIA platform 
led to speedups by the factors ~ 350 and ~ 100, as compared 
with the performances of a standard PC and of a computa- 
tional cluster's CPU, correspondingly. 

When compared to other, more sophisticated methodsi^^, 
our approach to the visualization of the Arnold diffusion in 
three-dimensional autonomous Hamiltonian systems revails 
some advantages, both in theoretical and experimental do- 
mains. First, our scheme neither demands pre- nor after- 
processing but only a straightforward integration of the cor- 
responding equations of motion. Second, it allows for a 
direct extension to the quantum limit while it is not at all 
clear how one could generalize the concept of finite-time 
Lyapunov exponent a^^''^-'^ or the frequency analysisi^ to the 
Schroedinger equation. 

The state-of-the-art experiments with ultracold atoms pro- 
vide a natural playground for the realization of our method. 
The creation of three-dimensional periodic optical potentials 
nowadays become an experimental routine^. The initial en- 
semble in a form of narrow distribution over a certain man- 
ifold in the three-dimensional momentum space can be pre- 
pared by using a diluted BEC cloud and the Bragg selection 
techniqua^ii^. The needed time. At < lOOT, where T is the 
period of oscillations at the bottom of potential well, is sev- 
eral orders of magnitude smaller then the characteristic deco- 
herence time^. Therefore, the evolution of ultracold atoms 



is Hamiltonian. The instantaneous momentum distribution 
can be measured by using the standard time-of-flight measure- 
ment technique^. Finally, a tunable effective Planck constant 
allows to probe both the semiclassical and the deep quantum 
limits. 

Another intriguing application that comes to mind is the im- 
plementation of the GPU-based mapping procedure to search 
for the footprints of the Arnold diffusion in the emission pat- 
terns of three-dimensional optical resonators^. 

Our work here illustrates the advantages, provided by GPC 
- GPU ideology, for nonlinear dynamics studies by using a 
particular example. Yet the potential of this approach reaches 
far beyond: it has already been put to work to explore (i) the 
noisy phase dynamics in a Josephson junction and the noisy 
Kuramoto mode^, (ii) the long-time evolution of nonlinear 
lattices^, and (iii) the functioning of inertial Brownian motors 
that are driven by colored noise'' -. We thus believe that the use 
of GPU computing in nonlinear science is only at a beginning: 
in the immediate future its great potential likely will provide 
many unforeseen findings and possibly even unravel manifest 
new phenomena. 
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